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COmiECTION  TABLES  EROM  WISWESSER  CHEMICAL  STRUCTURE  NOTATIONS- 

A  PARTIAL  ALGORITHM 

George  F.  Fraction* 
Justin  C.  Walker 
Stephen  J.  Tauber 


An  algorithm  has  been  developed  for  transforming  certain 
types  of  Wiswesser  organic  structure  notations  into 
connection  tables.   Acyclic  and  benzene  structures  are 
treated,  and  provision  has  been  made  for  all  of  the 
types  of  contractions  used  by  the  Wiswesser  notation 
system.  A  separate  algorithm  is  presented  for  treating 
linearly  fused  ring  aggregates.   A  syntax  has  been 
developed  to  describe  those  portions  of  Wiswesser 
notations  which  refer  to  non-benzene  ring  systems. 


Key  Words:   acyclic,  benzene,  chemical  structure  notations, 
connection  tables,  contractions,  ring  system,  syntax  analysis, 
transformation  algorithm,  Wiswesser. 


INTRODUCTION 


The  purpose  of  our  work  with  the  Wiswesser  notation  was  to  make 
generally  available  an  algorithm  for  generating  connection  tables 
from  such  structure  notations.   This  work  has  had  to  be  terminated 
before  reaching  its  final  objective  and  will  not  be  resumed  in  the 
foreseeable  future.   We  therefore  wish  to  make  our  results  available, 
although  they  are  admittedly  incomplete. 


•k 

Currently  employed  by  Information  Services,  Research,  Development 
and  Control,  Eli  Lilly  &  Co . ,  Indianapolis,  Indiana,   46206. 


The  algorithm  here  described  will  provide  connection  tables  from 
the  acyclic  and  benzene  portions  of  a  notation,  and  process  all 
forms  of  contraction  demanded  by  the  rules.   The  known  bugs  in  the 
algorithm  are  indicated,  with  suggestions  for  their  elimination. 
Finally,  our  approach  to  handling  non-benzene  cyclic  notations  is 
described.   This  comprises  a  syntactic  descritpion  of  the  portions 
of  a  notation  enclosed  in  the  L...JorT...  J  symbols.   In  addition, 
a  separate  program  has  been  written  to  construct  connection  tables  for 
linearly  fused  ri,ng  aggregates  from  the  syntactically  analyzed 
notations . 

Hyde  et  al.  have  recently  developed  a  program  for  a  similar  analysis 
of  Wiswesser  notations  [la J.   The  two  approaches  differ  in  several 
respects:   Hyde  e'^  al.  use  notational  symbols  as  the  basic  structural 
units,  (e.g.  Q,  V,  3)  whereas  we  use  the  individual  atoms,  bar 
hydrogen  attached  to  carbon.   They  in  turn  indicate  the  number  of 
hydrogen  atoms  attached  to  ring  carbon  atoms.   The  program  they 
describe  does  not  interpret  substituent  locants  of  benzene  rings 
in  terms  of  the  individual  ring  atoms.   They  do  not  describe  a 
method  to  handle  the  systematic  contractions  which  the  rules  require. 
Another  program  for  transforming  Wiswesser  notations  also  exists  [lb] 
but  it  is  not  publicly  available. 

The  programs  herein  described,  with  the  exception  of  the  one  written 
in  ALGOL- 60,  were  implemented  on  the  NBS  PILOT  computer  facility, 
which  has  been  decommissioned  since  this  work.   The  language  used 
was  the  assembly  language,  PEAP,  providing  for  three-address 
instructions  with  operation  code  modifiers,  sequence  control 
register  digits,  and  break-point  control  digit.   The  configuration 
of  the  system  was  64K  of  68-bit  word  core  memory,  magnetic  tape, 
magnetic  wire,  punched  tape,  typewriter,  and  Dataphone  in-  and 
output  and  punched  card  and  raster  scanner  input. 


1.   THE  NOTATION 


The  notation  rules  underwent  change  during  the  time  that  this 
laboratory  was  working  with  them  and  therefore  specific  note  had 
to  be  taken  of  the  modifications. 

The  versions  of  the  Wiswesser  notation  manual  of  October  19  64  and  of 
April  1965  were  compared  and  in  addition  members  of  the  Wiswesser 
notation  committee  were  contacted  directly  on  specific  points. 


The  following  specific  changes  were  noted  in  the  manual  [2a,  b]: 

1.  The  rank  of  the  slash  (/)  was  shifted  from  between  R  and  S 
to  between  hyphen  (-)  and  (j?)).   Cf.  Rule  2. 

2.  The  concept  of  "unshared  bridge  atoms"  and  a  rule  for  their 
citation  was  added.   Cf .  Rule  31C. 

3.  A  section  on  ring  analysis  was  added.   This  does  not  affect 
the  notation  rules  but  doe^s  give  suggestions  useful  in 
error-checking. 

4.  The  section  on  chelate  rings  was  eliminated. 

5.  A  section  was  added  which  comments  at  length  on  redrawing 
structural  formulae  for  easier  encoding,  with  particular 
attention  to  uncrossing  valence  bonds. 

6.  Rules  were  added  for  enciphering  ions,  free  radicals,  and 
isotopic  species.   The  specification  of  the  location  of 
charge,  unpaired  electron,  or  isotope  relies  on  citing 
the  number  of  the  position  within  the  symbol  string  of  the 
symbol  which  refers  to  the  atom  in  question.   Cf.  Rules  50-52. 

7.  Rules  were  added  for  citing  several  types  of  indeterminacies. 
Substituents  can  be  cited  as  attached  at  an  unknown  point,  to 
a  cyclic  atom  or  to  an  acyclic  atom  not  otherwise  specified, 
to  a  specific  ring  system,  to  a  ring  system  between  limits 

of  two  locants,  to  a  specific  side  chain,  or  to  a  specific  alkyl 
or  alkylene  chain.  Symbols  are  defined  for  unspecified  hydrocarbon 
moieties,  for  alkyl  groups  of  specified  size  but  unspecified 
structure,  for  generic  halogen,  and  for  generic  metal.   Monoco- 
ordinated  Markush  groups  can  be  cited.   Cf.  Rules  53-63. 

Correspondence  with  members  of  the  Wiswesser  notation  committee  dis- 
closed no  further  extensions  to  the  Wiswesser  notation  in  the  areas  of 
Markush  structures  or  partially  indeterminate  structures  nor  any  rules 
to  cover  steroisomerism,  coordination  complexes,  or  inorganic  struc- 
tures.  There  have,  however,  been  proposals,  not  yet  formalized  in 
rules,  which  provide  for: 

1.  The  use  of  'T)"  in  initiating  the  notation  for  a  ring  system 
which  contains  a  donor-acceptor  bond  [3], 

2.  The  use  of  "_0"  to  signify  a  metal locene  [4],  and 

3.  The  use  of  "C"  for  cis  and  "T"  for  trans  [5a]. 


Rules  have  also  recently  been  adopted  to  resolve  alternatives  among 
possible  multiplier  contractions  [5b].   There  evidently  is  still  no 
provision  to  distinguish  among  cyclic  double  and  triple  bonds  in 
certain  circumstances.   Thus,  according  to  Rule  31  [2b],  both  cyclo- 
octa-l,3,5,7-tetraene  (structure  I)  and  cycloocta-l,3,5-triene-7-yne 
(structure  II)  would  be  enciphered  as  L8J. 


2.   TRANSFORMATION  ALGORITHM 


The  algorithm  under  development  for  transforming  Wiswesser  notations 
(WISCX))  was  being  so  designed  as  to  yield  connection  tables  compatible 
with  those  resulting  from  the  computer  routine  developed  for  trans- 
forming Hayward  notations  [o].   The  atoms  would  generally  be  listed 
in  a  different  sequence  depending  on  whether  a  Wiswesser  or  Hayward 
notation  had  provided  the  input,  but  the  format  of  the  connection 
tables  resulting  from  the  two  transformations  would  be  identical. 
An  algorithm  was  essentially  completed,  programmed  for,  and  partially 
debugged  on  the  NBS  PILOT  computer  facility  to  process  benzene  rings, 
acyclic  portions,  and  multiplier  contractions.   The  algorithm  is 
shown  at  the  stage  it  reached  in  Figures  1  through  19.   The  operation 
of  this  algorithm  is  highlighted  in  sections  2b  and  2c  below. 

No  algorithm  usable  for  processing  the  general  cyclic  portions  of 
Wiswesser  notations  has  been  devised.   However,  a  linguistic  structure 
of  such  cyclic  portions  was  worked  out  as  a  basis  for  a  transformation 
algorithm.   Furthermore,  an  algorithm  for  analyzing  a  limited  class 
of  non-benzene  cyclic  notation  portions  was  devised.   The  syntax  for 
general  cyclic  portions  and  this  limited  algorithm  are  presented 
in  section  2d. 

No  attempt  was  made  in  this  phase  of  the  work  to  deal  with  notations 
for  salts,  ions,  free  radicals,  isotopic  species,  indeterminate 
locations,  or  generic  substructures.   For  the  present  work  a  maximum 
coordination  number  of  four  has  been  assumed.   Furthermore,  it  is 
suspected,  although  the  algorithm  was  not  developed  far  enough  to 
verify  this  suspicion,  that  the  processing  of  complex  combinations 
of  branched  locants  in  ring  system  notations  would  require  an  effort 
disproportionate  to  the  benefit  to  be  derived  from  being  able  to  deal 
with  such  rarae  aves«   It  appears  to  us  sufficient  to  be  able  to 
construct  connection  tables  from  notations  containing  simple  branched 
locants,  and  merely  to  shunt  out  those  which  contain  either  two 
branched  locants  from  the  same  locant  or  a  branched  locant  from  a 
branched  locant,  i.e.,  those  covered  by  Rule  39  [2b].   We  also 
believe  that  it  is  sufficient  to  be  able  to  process  notations  for  a 
single  ring  of  rings  (Cf.  Rule  44  [2b]);  it  is  in  any  case  not  clear 
how  Rule  44  is  to  be  applied  to  structures  with  two  or  more  rings  of 
rings. 
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WISCXD  incorporates  error  checks  both  on  the  input  notation  and  on 
the  program  itself.   The  errors  are  detected  by  looking  for  invalid 
situations  and  printing  out  an  appropriate  error  message.   For 
instance,  if  the  character  currently  being  examined  is  an  "H" 
(hydrogen)  or  a  "/*'  (slash  mark)  and  the  atom  counter  J  has  value  1, 
this  indicates  that  the  character  initiates  the  notation  and  the 
cipher  is  rejected  (except  when  the  notation  is  HH).  Another  instance 
is  a  symbol  for  a  multiple  bond  in  an  illegal  context,  e.g., 
UZ  (i.e.,  =NH-,  which,  though  chemically  conceivable,  would  receive 
a  different  coding).   When  a  symbol  which  stands  for  an  atom  set 
that  demands  a  single  bond  connection  to  the  rest  of  the  compound 
occurs,  the  bond  indicator  (set  to  2  or  3  when  U  or  UU  is  encountered) 
is  checked,  and  if  the  indicator  is  non-zero  the  notation  is  rejected. 

Several  ancillary  routines  were  developed  for  use  with  WISOO.   These 
were  for  convenience  in  implementation;  they  are  not  direct  conse- 
quences of  the  logic  evolved  for  transforming  Wiswesser  notations. 
Subroutine  DEBUG  (Cf.  Figures  21  through  25)  allows  examination  of 
system  storage  when  a  notation  has  been  fully  processed  or  when  an 
error  exit  occurs.   The  programmer  may  then  either  look  at  storage 
or  modify  storage  and  re-enter  WISCD  at  the  beginning.   The  entry 
to  WIS CO  is  via  subroutine  INPUT,  which  reads  a  cipher  one  character 
at  a  time,  checking  each  character  for  acceptability  and  storing  it 
in  a  separate  word  of  memory.   (The  6-channel  Flexowriter  codes 
are  converted  upon  input  to  8-bit  ASCII.)   Two  spaces  in  sucession 
signify  the  end  of  the  cipher.   After  a  cipher  has  been  read  in, 
control  passes  to  WISCO  proper.   Storage  tables  are  set  up  not  at 
assembly  time  but  rather  via  subroutine  GIMME  (Cf.  Figures  26 
through  28).   Storage  is  allocated  pseudo-dynamically;  i.e.,  storage 
is  provided  at  run  time,  but  if  an  overflow  occurs  then  the  storage 
is  automatically  augmented  and  the  cipher  is  reprocessed  afresh. 
GIMME  is  called  with  an  argument  SIZE.   A  value  of  zero  for  SIZE 
signals  initialization  for  a  new  cipher.   If  SIZE  has  a  "small" 
value,  GIMME  creates  table  storage  of  that  size  up  to  memory 
capacity  and  tells  WISCO  the  table  limits  via  SIZE.   A  "large" 
value  of  SIZE  represents  a  table  address  and  tells  GIMME  that  a 
certain  block  of  storage  is  being  returned  and  causes  this  table  to 
be  erased  from  GIMME 's  records.   "Large"  and  "small"  values  are 
defined  by  a  constant  in  GIMME.   GIMME  will  provide  table  storage 
of  up  to  several  thousand  68-bit  words. 

In  the  course  of  processing  a  notation  there  are  several  syntactic 
structures  which  require  WISCO  to  employ  recursive  techniques. 
The  recursion  is  facilitated  by  the  use  of  pushdown  stacks,  i.e., 
last-in  first-out  storage,  which  are  used  for  processing  multipli- 
cative contractions  and  branch  symbols,  as  described  in  section  2a 
below. 


2a.   Multiplicative  Contractions  and  Branch  Symbols 

Multiplicative  contractions  and  chains  of  benzene  rings  necessitate 
recursive  processing  because  multiplied  strings  may  be  nested  (but  not 
overlapped)  and  because  the  routine  must  be  able  to  find  the  proper 
ring  to  which  to  attach  a  given  substituent.   When  the  algorithm  is 
extended  to  handle  general  cyclic  notations  all  ring  systems,  including 
benzene  rings,  must  be  processed  with  the  same  push-down  stacks. 
Similarly,  all  branch  symbols  must  be  processed  recursively.  As  each 
layer  of  a  nested  sequence  of  multiplied  strings  is  encountered  the 
relevant  information  is  pushed  onto  a  stack.   The  expansion  of  the 
notation  (i.e.,  duplication  of  portions  of  the  connection  table)  then 
begins  with  the  innermost  layer  of  the  nest,  and  each  recursion  expands 
everything  from  a  given  layer  in.   For  example,  in  notation  VI  first 
the  portion  "/NCU1/_A-"  is  expanded;  then,  at  the  same  level,  "/SW0NUM_3"; 
next,  the  entire  portion  from  the  first  "//"  to  the  second  "//";  finally 
between  the  triple  slashes. -5^ 

WISCO  recognizes  and  processes  the  following  types  of  multiplicative 
contraction,  an  example  of  each  of  which  is  indicated: 

1.  Multiplication  of  initial  strings  (notation  VII); 

2.  Multiplication  of  terminal  strings  and  side  strings 
(notations  VIII  and  IX); 

3.  Multiplication  on  a  central  asymmetric  unit 
(notations  X  and  XI); 

^■.      Polymeric  multiplication  (notation  XII); 

5.  Multiplication  of  strings  cited  prior  to  the  citation  of 
the  ring  to  which  they  are  attached  (notation  XIII); 

6.  Multiplication  of  strings  cited  after  the  citation  of  the 
ring  to  which  they  are  attached  (notation  XIV); 

7.  Multiplication  of  strings  attached  to  rings  but  with 
implicit  locants  (notations  XV  and  XVI). 

Push-down  stacks  are  also  used  for  processing: 

8.  Rings — the  correspondence  between  atom  number  and  locant 

on  each  ring  is  stored  together  with  the  number  of  possible 
substituents  at  each  locant; 

9.  Branched  atom  symbols — the  atom  number  of  that  atom  is 
stored  once  for  each  branch  attached;  and 


The  structures  to  which  this  and  other  notations  refer  are  shown 
in  the  appendix. 


V!  ZVX0aM1YaV0YUSaYGU1X///lUNY//0/NCU1/_4 

X/SWONUM  _ 3//_2  ///_ 3 

VII  ZYUSaO-2N101U1M_2Y3UYZSONNUMax 

Vl!f  ZYSHYV02M1U1Y/S01XZaNW_2 

iX  Z5X/MNUNNW™26Y0 

X  QVY01U1_2/PQ0/ 

Xi  Z1U10^3/YSWM/ 

Xn  ZSW/2VM/-4SW0 

XIII  WN-4-B_C-DR-EZ 

XIV  ZOVR_B-_C-_E-_  F-/V00_4 

XV  WN«5-R_FZ 

XVI  WNR-/Z_5 


XVI 


I         ^N-C-CH2-CH2  -CH=CH-N-CH=CH-CH2-CH2-C~n' 
H  ^H 


10.   Variable  valence  branching  symbols — the  atom  number  is 
stored  for  each  variable  valence  atom;  a  flag  is  stored 
for  each  ring.   Each  atom  with  variable  valence  is  pushed 
onto  the  branched  atom  stack  four  times,  which  exceeds  the 
number  of  branches  peirmitted  by  the  present  algorithm. 
Whenever  the  routine  has  finished  with  the  top  entry  in  the 
variable  valence  stack,  it  erases  the  record  of  the  correspond- 
ing atom  from  the  top  of  the  branched  atom  stack  or  of  the 
corresponding  ring  from  the  top  of  the  ring  stack. 

Not  all  of  the  above  features  were  successfully  implemented.   The 
subroutine  which  processed  the  locants  used  in  multiplication  type  ^■ 
above  did  not  give  the  proper  results;  it  was  removed  and  has  not 
yet  been  rewritten.  A  single  atom  number  for  the  last  atom  in  a  string 
of  type  1  or  the  first  atom  in  a  string  of  type  2  above,  is  stored  for 
reference  in  determining  the  limits  for  multiplication.  This  provision 
should  be  modified  also  to  allow  for  recursion  because,  e<.g.,  in  a 
nested  sequence  of  type  1  multiplication  only  the  atom  number  needed 
for  the  last-encountered  multiplier  is  now  retained. 

2b.  Acyclic  Notations       - 

With  these  conventions  in  mind,  we  turn  now  the  processing  of  a  cipher. 
Consider  structure  XVII,  which  is  coded  as  ZV3U1_2M.  The  fact  that  the 
first  character  is  a  Z  calls  a  routine  which  stores  the  NH  group 
as  follows:! 

1  H  i0,2} 

2  N  (j?,l)   (0,3) 

3  H   (J?,2) 

where  for  each  (i,j)  couple  i  represents  the  bond  type  for  the 
connection  between  the  current  atom  and  atom  j,  A  single  bond  is 
indicated  by  i  =  J?;  double,  triple,  and  nonlocalized  multiple  bonds 
are  represented  by  i  =  2,  3,  and  1,  respectively.   If  the  NH  group 
had  not  begun  the  notation,  it  would  be  stored  in  the  order  N,  H,  H, 
thus  terminating  a  path.  But  in  this  case  the  path  must  continue 
from  Z  with  the  next  atom,  and  a  back  connection  of  2  (the  atom  number 
of  the  N)  is  stored.   In  a  similar  way,  the  symbol  V  is  translated 
as  C=0  and  the  connection  table  becomes: 


(0,^) 


1 

H 

(J2',2) 

2 

N 

(0,1) 

(0,3) 

3 

H 

(0,2) 

U- 

C 

(0,2) 

(2,5) 

5 

0 

(2,4) 

1 

Actually  atomic  numbers  are  stored,  not  atomic  symbols.  The  numbers 
of  the  atoms  on  the  left  are  for  ease  of  reading  only;  they  do  not 
actually  appear  in  the  machine  representation. 


The  symbol  3  calls  a  subroutine  which  stores  in  sequential  order 
as  many  carbons  as  indicated  by  an  alkyl  numeral.   The  symbol  U 
causes  a  bond  indicator  to  be  set  equal  to  2  and  processing  continues* 
As  each  atom  is  encountered  in  the  symbol  string,  the  back  connection 
is  stored  in  its  row  of  the  connection  table  and  its  number  is 
stored  in  the  row  of  the  atom  so  connected  to  it.   Thus  the  bond 
indicator  is  set  before  the  bond  to  which  it  refers  has  been  stored. 
When  atom  9  has  been  reached,  the  connection  table  appears  thus: 


1 

H 

(0,2) 

2 

N 

(0,1) 

(0,3) 

(0,4) 

3 

H 

(0,2) 

4 

C 

(0,2) 

(2,5) 

(0,6) 

5 

0 

(2,4) 

6 

c 

(0,4) 

(0,7) 

7 

c 

(0,6) 

(0,8) 

8 
9 

c 
c 

(0,7) 
(2,8) 

(2,9) 

Next  the  routine  encounters  "_2",  a  space  followed  by  a  number, 
which  must  be  a  multiplier.   Several  possibilities  are  consequently 
checked  for  immediately  following: 

i)  a  space  followed  by  a  letter,  i,e*> 
a  locant 

ii)   "-R",  i.e.,  implicit  locants 

iii)   "/",  i.e.,  possible  central  asymmetric 
unit 

iv)   two  spaces,  i.e.,  the  end  of  the  cipher. 

Any  one  of  these  situations  would  cause  specific  action.   The  present 
example,  however,  requires  further  analysis,  since  the  local  context 
of  the  multiplier  admits  either  a  multiplicative  contraction  of  type  1 
above  or  the  multiplied  side  chains  of  type  2  above.   The  routine  as 
written  pushes  the  multiplier  (2)  onto  the  top  of  DECML  and  the  most 
recent  atom  number  (9)  onto  JKEEP. 

A  modification  of  the  routine  would  permit  resolution  of  this  situation, 
since  if  type  2  multiplicative  contraction  were  the  case  a  slash  would 
have  been  noted  on  the  slash  stack.  A  slash  between  the  beginning  of 
the  notation  and  the  multiplier  must  be  paired  with  the  latter,  for  the 
slash  delimits  the  range  of  the  multiplier.   If  the  slash  stack  is 
empty  the  multiplier  covers  everything  back  to  the  beginning  of  the 
cipher. 
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The  next  character  examined  is  'Tyi",  and  the  connection  table  looks 
as  follows  after  it  has  been  processed: 

9  C  (2,8)  (0,li?) 
lJ2f  N  (0,9)  (0,11) 
11  H   (0,10) 

Now  we  have  reached  the  end  of  the  cipher.  At  this  point  a  check 

of  the  various  stacks  is  made  to  see  whether  there  is  any  outstanding 

business,  in  particular  whether  any  multiplication  needs  to  be 

effected  or  whether  any  methyl  contractions  need  to  be  expanded. 

If  the  branched  atom  stack  is  not  empty,  the  atom  indicated  has  an 

implied  methyl  attached.   Once  the  branched  atom  stack  is  empty 

(as  it  is  immediately  in  this  case),   JKEEP  is  tested.   If  JKEEP  =  0, 

ring  locant  storage  is  checked  for  ring  multiplication.   In  our 

example,  JKEEP  ?^  0,  which  means  that  multiplication  of  type  1  has 

been  defined.   (WISCO  as  presently  implemented  will  not  detect 

type  2  multiplication  since  it  assumes  at  this  point  that  type  1 

is  specified.   This  defect  could  be  eliminated  as  indicated  previously.) 

A  subroutine  then  copies  the  portion  of  the  connection  table  indicated, 

in  our  case  from  atom  1  to  atom  9,  indicated  by  JKEEP.  A  switch 

indicates  that  the  sequence  is  to  be  copied  in  the  reverse  direction, 

i.e.,  from  the  end  to  the  beginning.   The  values  of  j  in  the  (i,j) 

couples  must  be  suitably  modified  during  copying.   (This  routine  will 

work  satisfactorily  with  a  slight  modification  for  "forward"  copying, 

but  a  bug  as  yet  uncorrected  prevents  proper  "reverse"  copying.) 

The  end  result  of  the  algorithm  will  be: 


H 
N 
H 
C 
0 
C 

c 

8  C 

9  C 
I9f     N 

11  H 

12  C 


13 
in 
15 
16 
17 


C 
C 
C 

c 

0 


18  H 

19  N 
20'  H 


(0,2) 

(0,1) 

(0,2) 

(0,2) 

(2,4) 

(0,4) 

(0,6) 

(0,7) 

(2,8) 

(0,9) 

(0,10) 

(0,10) 

(2,12) 

(0,13) 

(0,14) 

(0,15) 

(2,16) 

(0,19) 

(0,16) 

(0,18) 


(0,3) 

(2,5) 

(0,7) 

(0,8) 

(2,9) 

(0,10) 

(0,11) 

(2,13) 
(0,14) 
(0,15) 
(0,16) 
(2,17) 


(0,4) 
(i?,6) 


(0,12) 


(0,19) 


(0^,18)   (0,20) 


Type  2  multiplications  are  handled  in  an  analogous  way,  with  the 
copy  switch  set  for  forward  copying. 
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To  see  how  the  routine  handles  type  3  multiplication,  we  can 
generalize  from  the  foregoing  example.   This  type  is  recognized 
by  a  multiplier  followed  by  a  slash  when  the  slash  stack  is  empty. 
The  atom  number  for  the  last  atom  symbolized  before  the  multiplier  is 
stored  in  JKEEP.   When  the  terminal  slash  is  encountered,  the  last 
non- terminal  atom  within  the  slashes  is  under  current  consideration 
and  is  joined  to  a  duplicate  of  the  last  atom  of  the  repeated  string. 
The  copying  routine  is  called  with  the  "reverse"  switch  set.   If  the 
multiplier  is  greater  than  2,  then  the  branched  atom  stack  will 
indicate  the  additional  points  of  attachment  within  the  asymmetric 
central  unit  (Cf.  structure  XI).   Any  residual  connections  within 
the  slash  marks  then  imply  methyl  groups. 

Polymer-type  multiplication,  type  4,  is  handled  straight-forwardly 
by  "forward"  copying  of  all  of  the  atoms  symbolized  between  the  pair 
of  slashes. 

Since  there  is  at  least  a  superficial  reason  to  expect  confusion 
between  the  "&"  for  an  explicitly  cited  methyl  group  and  the  "&" 
ending  a  branch  containing  either  a  variable  valence  atom  or  a 
ring,  we  shall  consider  the  relevant  contexts.   When  an  "&" 
(which  is  not  immediately  preceded  by  another  "&")  is  reached 
there  are  three  possibilities — the  preceding  atom  symbol  may  be 
"X",  "Y",  "K",  or  "N";  it  may  be  strictly  terminal;  or  it  may  be 
a  non^zerminal  symbol  other  than  "X",  "Y",  "K",  or  "N".   If  the 
preceding  atom  symbol  is  one  of  these  branching  symbols,  then 
the  "&"  signals  methyl  contraction,  as  do  any  succeeding  ampersands 
until  the  branching  requirements  of  the  atom  have  been  satisfied. 
(Cf.  notation  XVIII.)   If  the  atom  symbol  preceding  the  "&"  is  a 
non- terminal  symbol  other  than  one  of  these  branching  symbols, 
then  the  first  "&"  signals  the  end  of  a  chain.   If  there  is  a 
branching  carbon  or  nitrogen  at  the  top  of  the  branched  atom  stack, 
and  further  "&"s  follow,  then  the  succeeding  ampersands  signal  methyl 
contractions  until  the  branching  requirements  of  the  indicated  atom 
have  been  satisfied.   (Cf.  notation  XIX).   For  either  case  of  a 
non-terminal  atom  symbol  preceding  the  first  "&",  each  further  "&" 
then  indicates  that  all  branches  attached  to  one  variable  valence 
atom  have  been  cited.   (Cf.  notations  XX  and  XXI).   For  each  such 
succeeding  "&"  the  top  of  the  variable  valence  stack  is  fetched; 
if  it  is  a  ring  flag,  then  the  corresponding  ring  is  discarded  from 
further  consideration;  if  it  is  an  atom  number,  then  the  top  of 
the  branch  symbol  stack  is  discarded  as  long  as  this  equals  the  top 
of  the  variable  valence  stack;  then  the  top  of  the  variable  valence 
stack  is  discarded. 

If  the  atom  symbol  preceding  the  first  "&"  is  strictly  terminal,  then 
this  "&"  does  not  end  the  chain  and  it  must  therefore  either  represent 
a  methyl  group  or  signal  the  end  of  a  branch  containing  either  a 
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XVIII 
XIX 

XX 

XXI 

XXII 

XXIII 

XXIV 


zxaai-si-QOQ 
zx2aa-si-ooo 

ZiYPGYaaSYG-SI-QOO 
Z1YPGY2aaaYG-SI-0QQ 
QYGVR„ER  =  DGa„CY 
ZYa2Y1-SE-*2Q2Q20aYaY1Z 

ZYa2Y1-SE-W2QaYaY1Z 


XXV 


XXVI 


-C-N-CH -C=CH 


NH2 


CH3"C-GH3 
CH3 
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variable  valence  atom  or  a  ring  (Cf .  notation  XXII).  Note  that 
a  methyl  contraction  must  refer  to  the  branch  symbol  immediately 
preceding  it  and  that  the  implied  methyl  contraction  is  limited  to 
the  last  branch  symbol  (and  to  "X",  "Y",  "N",  or  "K")  so  that  there 
can  be  no  ambiguity  in  the  meaning  of  the  top  of  the  branched 
symbol  stack. 

Rule  8b  [2b]  appears  to  require  one  "&"  for  each  branch  point 
encountered  while  backing' down  the  branch  to  the  point  where  ciphering 
resumes.   This  complicates  the  algorithm  and  appears  to  be  unnecessary 
to  the  notation  system.   It  is  agreed  that  one  "&"  per  variable 
valence  branch  point  would  suffice  [5b].   However,  this  modification 
has  not  (yet)  been  adopted. 

There  is  possible  ambiguity  due  to  symbols  which  may  or  may  not  be 
branching  symbols  depending  on  valence.   We  have  adopted  the  asterisk 
as  a  flag  to  follow  such  a  symbol  when  it  is  a  branching  symbol 
(Cf.  notation  XXIII),  except  when  the  only  branches  attached  are 
represented  by  "W"  (Cf.  notation  XXIV). 

The  existing  algorithm  has  a  known  bug  in  its  handling  of  branches. 
It  will  decode  the  notation  TA-5PTJ_ER_D1UYX1MVM_CL66TJ  ,  for  structure 
XXV,  in  such  a  way  as  to  yield  the  connections  Y-X-l-M,  i.e.,  it  does 
not  in  this  instance  take  cognizance  of  Rule  10  [2b],  which  demands 
deletion  of  the  "&"s  in  a  methyl  contraction  on  the  only  or  last 
branch  symbol  in  the  (uncontracted)  notation. 

In  order  to  remove  this  bug,  a  back  connection  to  a  branch  point 
must  not  be  made  until  the  notation  has  been  completely  processed, 
i.e.,  until  the  end  of  the  cipher  has  been  reached.   From  the  above 
example  it  is  evident  that  the  last  branch  point  in  the  notation  can 
be  determined  only  after  all  characters  have  been  examined.   This 
suggests  to  us  that  rules  other  than  Rule  10  may  lead  to  similar 
situations,  and  that  similar  bugs  remain  to  be  discovered. 


2c.   Benzene  Notations 


We  begin  by  considering  a  simple  example:   Structure  XXVI  is 
ciphered  WNR_CZ_DZ. 

The  first  two  characters  are  processed  as  indicated  in  the  preceding 
section,  a  subroutine  handling  "WN"  in  a  manner  similar  to  the 
handling  of  an  initial  "Z".   When  the  "R"  is  encountered,  six  carbons 
are  entered  into  the  connection  table  with  the  first  and  last  atoms 
connected  to  each  other  and  with  nonlocalized  multiple  bonds  indicated! 


U 


1 

0 

(2,2) 

2 

N 

(2,1) 

(2,3) 

(0,4) 

3 

0 

(2,2) 

4 

C 

(0,2) 

(1,5) 

(1,9) 

5 

c 

(1,4) 

(1,6) 

6 

G 

(1,5) 

(1,7) 

7 

C 

(1,6) 

(1,8) 

\ 

8 

C 

(1,7) 

(1,9) 

9 

C 

(1,8) 

(1,4) 

Next  a  temporary  block  of  six  words  is  pushed  onto  the  ring  stack, 
For  our  example  this  block  would  look  as  follows: 


1 

0 

4 

A 

0 

1 

5 

B 

0 

1 

6 

C 

0 

1 

7 

D 

0 

1 

8 

E 

0 

1 

9 

F 

The  1  in  the  left-hand  position  of  the  first  word  indicates  a 
benzene  ring,  in  anticipation  of  the  routine  enlarged  for  processing 
general  cyclic  structures.   The  next  field  contains  the  nunber  of 
substitutable  positions  left  for  each  ring  atom,  decremented  each 
time  a  substituent  is  attached.   The  third  field  has  the  atom  number 
corresponding  to  the  locant  given  in  the  right-most  field.   When, 
in  our  example,  the  locant  "_C"  is  encountered,  the  storage  block 
for  the  current  ring  is  searched  for  the  locant  and  the  corresponding 
atom  number  (here  6)  is  stored  as  the  back  connection,  the  second 
field  is  decremented,  and  processing  continues  with  the  substituent 
symbol  following  the  locant.  After  the  first  such  substituent  has 
been  processed,  the  connection  table  appears  in  part  as: 

'6   C  (1,5)   (1,7)   (0,10) 


9  C  (1,8)   (1,4) 

10  N  (0,6)   (0,11)   (0,12) 

11  H  (0,10) 
■  12  H  (0,10) 

When  an  "&"  is  encountered  that  finishes  a  branch  containing  a  ring 
(Cf.  cipher  XXII),  the  current  ring  block  is  discarded  and  processing 
returns  to  the  previous  ring.   In  this  example  a  block  is  set  up  for 
the  first  "R"  and  the  locant  "_E"  is  interpreted  with  its  aid;  then  a 
second  block  is  pushed  onto  the  stack  for  the  second  "R".   The  locant 
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"_D"  is  interpreted  from  the  second  block.   The 

second  block  to  be  erased,  and  the  locant  "_G"  is  interpreted 

again  from  the  first  block. 

For  a  cipher  with  type  5  multiplication  (Cf.  notation  XIII),  the 
locants  cited  before  the  ring  are  stored  in  a  table.   When  the  ring 
block  has  been  created  for  the  "R",  these  locants  provide  the  back 
connections  for  the  repeated  string.   Type  6  multiplication  (Cf. 
notation  XIV)  is  handled  in  much  the  same  way  except  that  it  is  the 
hyphenated  locants  that  are  packed  into  a  table. 

The  analysis  of  type  7  multiplication  is  not  debugged  in  the  current 
version  of  WISCO.   For  either  of  the  two  cases,  expansion  of  the 
multiplied  group  (as  intended)  must  be  the  last  item  in  processing 
the  notation,  because  the  counts  are  implicit.   When  either  "-R"  or 
"-/"  is  encountered,  an  appropriate  flag  corresponding  to  the  proper 
ring  can  be  pushed  onto  the  top  of  a  stack.   When  the  end  of  the  cipher 
is  reached  (and  all  other  matters  have  been  attended  to),  the  residual 
locants  on  the  ring,  as  indicated  by  I's  in  the  second  field  of  the 
atom  words  of  the  ring  stack,  provide  the  back  connections  for  the 
multiplied  strings. 

The  copying  routine  requires  knowing  the  atom  number  for  the  back 
connection  and  the  limits  of  the  connection  table  to  be  copied.   In 
"forvard"  copying,  the  replicate  of  the  first  atom  of  the  repeated 
portion  is  back-connected  to  the  atom  indicated  above;  all  other  atom 
numbers  in  the  (i,j)  couples  are  edited  during  copying  by  adding  a 
constant  to  correspond  to  the  position  in  the  connection  table  of  this 
portion  after  replication.   For  "backward"  copying,  the  editing  of  atom 
numbers  requires  subtraction  from  a  constant. 

2d.   General  Cyclic  Notations 

The  basis  devised  for  an  algorithm  to  effect  the  complete  analysis 
of  cyclic  notations  is  the  separation  of  those  portions  of  the 
notation  which  lie  between  the  delimiters  L  and  J  or  T  and  J  into 
distinct  fields.   Each  of  these  fields  deal  with  one  of  the  several 
types  of  features  which  may  be  cited  as  parts  of  the  cyclic  system. 
The  results  of  such  an  analysis  would  be  used  as  input  to  an  algorithm 
for  creating  an  atom-by-atom  listing  of  each  cited  ring,  such  as  is 
shown  for  the  special  case  of  linearly  fused  rings  (Cf.  below),  and 
for  identifying  the  nature  of  the  bond  between  each  pair  of  atoms. 

The  fields  which  may  be  present  are  initial  character,  ring  declaration, 
shared  bridge  locants,  unshared  bridge  locants,  multicyclic  points, 
last  locant,  hetero  symbols,  unsaturation  codes,  ring  saturation  code, 
and  final  character.   For  example,  in  the  notation  T666_B6_C6_3ABC_S 

ECMJ  for  the  cyclic  system  shown  as  structure  III,  the  fields  present 
are:   initial  character,  ring  declaration,  multicyclic  points,  last 
locant,  hetero  atom,  and  final  character. 
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The  initial  character  must  be  an  L  or  T,*which  is  retained  for  later 
error  checking.   The  routine  then  looks  for  ring  information,  filling 
a  table  with  the  ring  size  and  the  atom  number  corresponding  to  the 
initial  (fusion)  locant  of  each  ring,  in  the  order  of  citation. 
Thus,  for  our  example  the  ring  declaration  would  be  as  follows: 


Initial 

Ring 

Locant 

Size 

1 

6 

1 

6 

1 

6 

2 

6 

■3    '   ■■ 

6 

The  end  of  this  portion  of  the  routine  is  signaled  by  any  one  of 
the  following: 

1.  Locant  followed  by  a  non-numeric  character 

2.  "/" 

3.  Space  followed  by  a  number 

4.  "K",  "M",  "N",  "0",  "P",  "S",  "V",  "X",  "Y",  or  hyphen 
followed  by  a  non-numeric  character 

5.  "&"  or  'T" 

6.  Final  character  "J"  - 


If  the  ring  declaration  is  followed  by  a  locant  and  a  non-numeric 
character  (Gf.  notation  IV),  then  a  shared  bridge  atom,  a  ring 
segment  symbol,  a  hydrogen  saturation  code,  or  an  unsaturation  code 
follows.   Such  locants  are  converted  to  the  corresponding  number 
and  stored  in  another  table.   Similarly  a  slash  indicates  at  least 
one  unshared  bridge  atom  (Cf.  notation  V).   A  space  followed  by 
a  number  indicates  that  multicyclic  point  atoms  are  cited,  and  these 
are  similarly  stored  in  a  table.   For  the  above  example  (structure  III) 
the  table  of  such  atoms  would  appear  as  follows: 

1  , 
2 

'.  .'■      3 
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As  with  the  ring  declaration  field,  these  fields  are  well-defined 
and  their  terminations  can  be  detected  immediately.   If  multicyclic 
point  atoms  are  present,  the  last  locant  pertaining  to  the  cyclic 
system  must  also  be  cited.  This  is  similarly  converted  to  a  number 
and  stored  for  future  reference.  Any  ring  segment  symbols  are  then 
picked  up  and  stored  along  with  the  numerical  equivalent  of  their 
locants;  in  the  example  on  the  previous  page 

11(=  K)  M 

is  the  only  such  entry.   Similarly  citations  of  H  and  U  or  UU  are 
stored  with  the  (converted)  locants. 

"K"  or  one  of  the  other  symbols  listed  under  possibility  4  indicates 
a  ring  segment  symbol  with  implied  locant  _A  in  a  monocycle.  A  hyphen 
followed  by  a  non-numeric  character  presumably  introduces  a  two- letter 
atomic  symbol. 

The  "T"  and  "&"  symbols  are  stored  and  assigned  a  number  according  to 
the  order  of  citation  so  that  they  may  be  paired  with  the  corresponding 
rings.   The  occurrence  of  "J"  signals  the  end  of  the  cyclic  portion 
of  the  notation,  and  control  must  then  revert  to  the  calling  routine 
for  processing  of  the  tables  here  generated  and  of  portions  of  the 
notation  referring  to  attachments  to  the  cyclic  systems,  including 
interpretation  of  substituent  locants.  As  in  the  case  of  benzene 
rings  (Cf.  p.  11  above),  a  temporary  storage  block  would  be  created 
and  pushed  onto  the  ring-stack  for  each  ring  system  encountered.   The 
number  of  words  required  equals  the  numeric  value  of  the  last  locant 
occur ing  in  the  ring  system  plus  one  word  per  branched  locant.   The 
last  locant  may  be  stated  explicitly  in  the  notation,  or  else  the 
number  of  atoms  in  the  ring  system  may  be  computed  via  the  Landee- 
Bowman  equation  [ref.  2b,  p.  108], 

A  routine  has  been  written  in  ALGOL-60  (Cf .  Figure  20)  to  decode 
Wiswesser  line  notations  for  linearly  fused  ring  aggregates  (i.e., 
ones  with  neither  multicyclic  nor  enclosed  atoms).   It  assumes 
that  the  notation  has  been  analyzed  and  tabulated  as  above.  On  the 
basis  of  this  information  a  two-dimensional  array  (with  its  size 
determined  by  the  number  of  rings  and  the  maximum  ring  size)  is 
created  which  lists  each  ring  and  the  enumeration  of  the  atoms 
therein. 

If  the  notation  were  L_B666J,  then  the  ring  declaration  table  would  be: 

Initial         Ring 
Locant  Size 

2  6   • 

1  6 

1  6 
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This  routine  would  then  fill  up  the  array  as  follows: 
The  first  ring  starts  with  atom  number  2  and  has  size  6: 

i  1  it  1  1  I 
The  second  ring  starts  with  atom  number  1  so  the  array  would  become 

234567_ 

]^  2_ 

Since  atom  number  2  has  been  used  in  a  previous  ring,  we  must  get 
the  last  atom  in  that  ring  and  proceed  (to  six  atoms)  thus: 

i  i  I  i  i  i£ 

Again  the  next  ring  starts  with  atom  number  1,  but  atari  number  2  has 
already  been  twice  used,  so  we  must  get  the  last  atom  in  the  last 
previous  ring  cited  and  continue: 

iliLlil 
]^  2   7_  8  9    10 

1^  10   11   12_  13    14 

A  preliminary  syntax  for  this  analysis  has  been  developed.   This 
syntax,  as  given  in  Table  I,  will  apparently  generate  (or,  equivalently, 
accept)  that  set  of  correct  Wiswesser  notations  which  corresponds  to 
the  set  of  possible  non-benzene  cyclic  structures  excluding  the 
so-called  ring-of-rings  structures,  whose  notations  begin  with  L-  or  T—  , 
It  can  be  used  to  formulate  an  algorithm  either  to  generate  or  to 
accept  individual  notations.   The  syntax  distinguishes  in  those  portions 
of  the  notation  beginning  with  L  or  T  and  ending  with  J  the  following 
fields,  each  of  which  contains  a  specified  type  of  information: 

Initial  character  (L  or  T) 

Ring  declaration 

Fusion  locant  (may  be  null) 
Ring  size 

Enclosed  ring  atoms    -      - 
Unshared  bridge  atoms 
Shared  bridge  atoms 
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Multicyclic  points  and  last  locant 

Ring  segment  symbol  (required  if  initial  character  is  T) 

Carbon  saturation  (H) 

Unsaturation  (U  or  UU) 

Ring  saturation  (&  and/or  T) 

Final  character  (J) 

The  first  two  fields  and  the  last  must  be  present  in  any  non-benzene 
cyclic  notation.   The  others  may  or  may  not  be  preseht,  depending 
entirely  on  the  structure  under  cons idera t ion o  The  symbol  D  signaling 
a  ring  structure  with  a  donor-acceptor  bond  [3]  is  not  accounted 
for  by  this  syntax. 

Note,  however,  that  this  syntax  is  too  powerful,  in  the  sense  that 
it  will  generate  strings  of  symbols  which  are  not  acceptable 
Wiswesser  notations.   For  instance,  any  series  of  ring  declarations 
can  be  generated,  including  invalid  ones  such  as  in  L_B6J.   Implemen- 
tation of  restrictions  such  as  that  which  requires  either  <RSS> 
(see  below)  to  be  <NUII>  or  else  <SYMBOL>  to  be  V,  X,  or  Y  whenaver 
<IG>  is  L  will  require  a  considerable  increase  in  the  complexity 
of  the  rules. 

The  syntax  is  presented  in  the  so-called  Backus  Normal  Form  [7,8], 
The  metanotational  terms  (e.g.,  <IC>  for  initial  character)  are 
enclosed  in  pointed  brackets,  and  each  such  appears  exactly  once 
on  the  left-hand  side  of  a  rule  (i.e.,  a  string  of  the  form  ...  ::= 
...   ).   The  symbol   ::=  is  read  "is  replaced  by"  or  "is  written  as". 
The  vertical  line  indicates  alternatives;  concatenation  means  "and". 
Thus  the  rule 

<IC>   ::=  l|t 

means  that  any  occurrence  of  the  symbol  <IC>  is  replaced  by 
L  or  T;  the  rule 

<UBA>   : :  =  /<ATL0O 

indicates  that  <UBA>  is  rewritten  as  /  followed  by  <ATL0O. 

Note  that  some  rules  are  recursive:   the  symbol  on  the  left  of  a 
given  rule  may  also  appear  in  the  right-hand  side  thereof,  as  in 

<HYE>   ::=  -1-<HYP>. 
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This  rule  gives  rise  to  the  string  of  hyphens  which  indicate  a  branched 
locant  at  a  given  depth.   To  apply  the  rule,  rewrite  <HYE>  as  either 
-  or  -<HYI>  .   If  the  former  alternative  is  chosen,  the  recursion 
ends;  if  not,  reapply  <H.YE^  .   This  sequence  is  continued  until  a 
string  of  hyphens  of  the  desired  length  is  obtained.   Finally,  for 
the  fields  which  are  optionally  present  the  alternative  <NULI>  is 
given,  as  in  the  case  of  <ERA>,  to  allow  for  their  non-occurrence. 

As  indicated  above,  this  syntax  can  be  used  in  an  algorithm  to 
generate  individual  notations.   The  algorithm  used  to  parse  a  given 
string  will  in  effect  be  the  inverse  of  the  rule-application 
procedure  outlined  above,  so  that  the  notation  will  be  accepted  as 
input  and  will  be  parsed  (i.e.,  accepted)  if  and  only  if  a  "path" 
through  the  syntax  can  be  found  such  that  the  path  terminates 
with  the  rule 

<WLrv>   ::=  <IO  ...  <TC> 

An  algorithm  to  "turn  around"  a  procedure  of  this  kind  has  been 
discussed  by  Schneider  [9]. 

The  workings  of  the  syntax  will  be  illustrated  by  tracing  the  path 
which  would  be  taken  to  generate  the  string  T66_BMr&J  .   Starting 
at  <WL]N>  ,  one  is  led  first  to  <IC>  ;  T  is  generated  as  the  terminal 
symbol.   The  next  symbol  in  the  definition  of  <WLN>  is  <RD>  ;  the 
<L0O  <RSIZE>  <RD>  option  must  be  generated  in  order  to  get  the 
substring  of  two  ring  sizes.   <LOC>  must  generate  <NULL>  to  account 
for  the  suppressed  _A  locant,  and  <RSIZEI>  must  generate  <RNGMAG> 
to  get  the  single  digit  6  as  a  ring  size.   Going  through  <RD>  again, 
generating  <LiOO  <RSIZE>  ,  and  proceeding  as  above  generates  the  second 
6  with  a  suppressed  _A  locant.   Since  no  enclosed  ring  atoms  nor 
multicyclic  points  are  cited.  <ERA>  and  <MCE>  terminate  via  the 
<NUL1>  options  in  each  case*.   In  the  case  of  <RSS>  the  first 
option  allowing  recursion  is  required  in  order  to  cite  the  locant 
for  the  M.   The  symbol  <SPAGE>  terminates  immediately  with  _  ; 
<LOGMAG>  generates  <LETTER>  ,  which  terminates  with  B  for  the  locant. 
The  symbol  <SYMBOI>  generates  <WISLET>  which  terminates  with  M, 
Upon  recursion  through  <RSS>  the  <NI1LI>  option  is  generated.   Since 
there  are  no  H-saturations  or  explicit  unsaturations  in  the  cipher 
the  <NULI>  options  are  generated  for  <HSAT>  and  <UNSAT>  .  <ANTSAT> 
generates  first  the  string  T<ANTSAT>  ;  upon  recursion  the  string 
&  <ANSAT>  is  generated.   This  part  of  the  generation  is  stopped  by 
generating  the  <]SnJLI>  option  for  <ANTSAT>  .   The  generation  is 
completed  by  applying  the  rule  <TC>   ::=J^  For  the  tree  of  this 
generation,  see  Figure  29. 
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Table  I.  Preliminary  Syntax  for  Analysis  of 
Cyclic  Portions  of  Wisweaser  Notations 

Wiswesser  Notation  String: 

<WLN>  ::=  <IC>  <RD>  <ERA>  <MCE>  <RSS>  <HSAT>  <UNSA1>  <ANTSAT>  <TO; 
Initial  Character  Code: 

<IO  ::=  l|t  ; 
Ring  Declaration: 

<RD>  ::=  <LOC>  <RSIZE>|<LOC>  <RSIZE>  <RB>  ; 
Locant : 

<LOC>  ::=  <NULL>|<SPACE>  <LOCMAG>  ; 

<NTJLL>  ::=  I 

<SPACE>  ::=?  _  J 

<LOCMAa>  ::=  <LETTER>|<LETTER>  <MOI>  ; 

<LETTER>  ::=  a| b| c| d|e|e|g|h| i| j| k|l1m|n|o1p1  q|r|s|t|u| v|w  ; 

<MOD>  ::=  <AMP>|<HYP>|<NULL>  ; 

<AME>  ::=  &|&  <MOI>  ; 

<HYP>  ::=  -I-  <HYP>  ; 
Ring  Size: 

<RSIZE>  ::=  <RNGMAG>|  -  <N0NZERO  <DIGn>  -  ; 

<RNGMAG>  ::=  3K|5|6|7|8|9  ; 

<NONZERO>  ::=  l|2|<RNGMAG>  ; 

<DIGIT>  ::=  0|<NONZERO>  ; 
Enclosed  Ring  Atom: 

<ERA>    ::=  <NULI>|<UBA>|<SBA>|<UBA>  <SBA>    ; 
Unshared  Bridge  Atoms: 

<UBA>    ::=  /<ATL0O    ; 

<ATL0O    ::=  <L0CMAG>|<L0CMAGXATL0O    ; 
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Table   I.    (Continued) 

Shared  Bridge  Atoms: 

<SBA>    ::=  <LOCMAo|<LOC]yiA(P>  <SBA>    ; 
Multicyclic   Point  Atom: 

<MCP>    ::=  <NULL>|<MCP2>  <SPACE>  <LOCMAG>    ; 

<MCP2>    ::=    +1<L0CMAG>  | +1<MCP2>  <LOCMAG>    ; 
[Note:      n  successive  occurrences   of   +1  are  written  as   the   number   n,] 

Ring  Segment  Symbols : 

<RSS>    ::=  <NULI> | <SPACE>  <LOCMAG>  <S^!MBOI>  <RSS>  |<SYMBOL>  <RSS>    ; 

<SYMBOL>    ::=  <WISLE']>|-  <ABO  <ABO  -    ; 

<WISLE'I>    ::=  a|  b|h|  k|m|  n|o|p1s  ]  v|  w|x|  y|<HA1>    ; 

<HAL>    ::=  e|f|g|i    ; 

<AB0    ::=  <LETTER>|x|  y|z    ; 
Hydrogen  Saturation: 

<HSAT>    ::=  <NULL>|<SPACE>  <LOCMAG>   H  <HSA1>    ; 
Unsaturation   Code: 

<UNSA'I>    ::=  <mJLL>|<SPACE>  <LOCMAG>  <USYM>  <EXLOC>  <UNSAT>    ; 

<USYM>    ::=  u|uu    ; 

<EXL0O  ::=  <NULL>|-  <LOCMAG>  ; 
Ampersand  and  T  unsaturation: 

<ANTSA']>  ::=  <NULL>|&  <ANTSA'r>|T  <ANTSAT>  ; 
Terminal  Character  Code: 

<TO  ::=  J    - 
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FIGURE     20.    AL60L-60  PROGRAM  FOR   CONVERTING  WISWESSER   LINE  NOTATIONS  OF 
LINEARLY   FUSED  RING  SYSTEMS  TO  CONNECTION  TABLES. 


Integer-procedure  DECODE(SIZE);    comment   Initial   locant:    (ILOC). 
Number  of  rings:    (N); 

Comment     This   procedure   is   for  decoding  linear  strings   of 
fused  rings   coded   in  WLN.     All  ring  sizes   and  the    initial   locant 
of  each  ring  are  assumed  to  have  been  stored,   by  earlier  parts   of 
the   general   program  in  SIZE(R)  and  ILOC(R)  respectively,   where 
0  <  R  ^   N   (N  =   the   number  of  rings    in  the   fused  ring  aggregate). 
This   procedure  fills   a   two-dimensional  array  RING  with  the 
enumeration  of  each  atom   in  each  ring.      RING  is    so  ordered  that 
within  each  row  every  atom   is   connected  to   its  neighbor,  with   the 
last  atom   in  the  row  being   connected  to  the  first  atom  in  the  row, 
i.e.,   RING(R,J)    is   connected  to  RING(R,J€)1)  and  RING(R,Jll),   where 
©  and  2.  define  arithmetic  operations  modulo  SIZE(R); 

value  N;    integer  USED, FUSED, N,K,R, J; 

integer  array  SIZE,IL0C[1:N],RING[1:N,1:R]; 

begin 

USED:=J2»; 
R:=1; 

GAMMA.:K:=   ILOC   [R]; 

J:  =  l; 

BETA:RING[R,J]:=  K; 

if_  K  =  USED  then  K:=FUSED  else  begin 

i_f  J  =  SIZE[r]  then  begin 

if  R=N  then  go  to  STOP  else  begin 

FUSED: =K; 
USED:=ILOC  [r]; 

R:=  R+1;  £o  t£  GAMMA  end  end  else 

K:=  K+1  end 
DELTA:  J:=  J+1;  go  to  BETA; 
STOP:  end  DECODE 


Notes  for  Figures  21  through  25: 
Flow  Chart  Subroutine  To  Examine 
Storage  After  Processing  a  Wiswesser  Notation 


DEBUG  -  A  Subroutine  of  WISCXD 

SYMTAB  -  A  table  generated  at  assembly  time 

by  the  assembler.   The  length  of  the 
table  is  stored  in  a  location  called 
SYMEND.   Each  word  of  the  table 
(SYMTAB(I))  has  three  fields:   SYMTABCI)^ 
contains  the  name  of  a  location  or  of  a 
storage  area;  SYNTAB(I)2  contains  the 
address  corresponding  to  the  name; 
SYMTABCI)^  contains  the  number  of  words 
,.  accessed  by  the  name  if  a  storage  area 

or  a  flag  (all  ones)  if  a  named  location 
(i.e.,  the  address  of  an  instruction). 


encloses  a  message  to  be  printed  out  on-line, 
encloses  instructions  for  on-line  input. 


encloses  operations  which  are  machine- 
dependent  and  are  indicated  only  in  a 
general  statement. 


'contains  comments  intended  to  explain  a 
set  of  operations  and  create  perspective. 
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Notes  for  Figures  26  through  28: 
Flow  Chart  for  Pseudo-Dynamic  Storage  Allocation 


GIMME  -  A  Subroutine  of  WISCO  (Adaptable  to  a  general 
Storage  Allocator) 

Common  cell  SIZE  contains  either  number  of  words 
requested  or  first  address  of  storage  block  being 
returned.   In  the  first  case,  control  is  returned 
to  calling  point  with  table  limits  in  SIZE.   In 
the  second,  the  existence  of  the  table  indicated 
is  deleted  from  system  records. 

INUSE  is  a  block  of  20  locations  used  by  GIMME  to 
store  pointers  to  the  tables  it  has  created.   Copies 
of  these  pointers  are  returned  via  SIZE  to  the  calling 
routine. 


or  \  7  are  used  to  black-box 


operations  which  are  machine-dependent,  knowledge  of 
which  being  unessential  to  the  understanding  of  GIMME 's 
operation. 
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Boxes  of  the  form 


enclose  comments 


Which  are  intended  to  explain  an  operation  in  the  context 
of  the  end  result  desired  of  the  program. 

Lower  case  subscripts  u  and  e  indicate  the  portion  of  a 
word  in  INUSE  containing  respectively  the  upper  and  lower 
limits  of  the  corresponding  tables.   Upper  case  subscripts 
L  and  R  designate  the  left  and  right  half  respectively 
of  the  words  referred  to.   In  general  data  will  be  stored 
right-justified  (i.e.,  in  the  right  half). 
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out  on-line. 
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APPENDIX 

Structure  diagrams  for  Wiswesser  Notations 
(Roman  numerals  correspond  to  those  of  Notations) 
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a  year.  Annual  subscription:  Domestic,  $5.00;  for- 
eign, 36.00*. 

•  Mathematical  Sciences 

Studies  and  compilations  designed  mainly  for  the 
mathematician  and  theoretical  physicist.  Topics  in 
mathematical  statistics,  theory  of  experiment  design, 
numerical  analysis,  theoretical  physics  and  chemis- 
try, logical  design  and  programming  of  computers 
and  computer  systems.  Short  numerical  tables. 
Issued  quarterly.  Annual  subscription:  Domestic, 
32.25;  foreign,  32.75*. 

•  Engineering  and  Instrumentation 

Reporting  results  of  interest  chiefly  to  the  engineer 
and  the  applied  scientist.  This  section  includes  many 
of  the  new  developments  in  instrumentation  resulting 
from  the  Bureau's  work  in  physical  measurement, 
data  processing,  and  development  of  test  methods. 
It  will  also  cover  some  of  the  work  in  acoustics, 
applied  mechanics,  building  research,  and  cryogenic 
engineering.  Issued  quarterly.  Annual  subscription: 
Domestic,  32.75;  foreign,  33.50*. 

TECHNICAL  NEWS  BULLETIN 

The  best  single  source  of  information  concerning 
the  Bureau's  research,  developmental,  cooperative 
and  publication  activities,  this  monthly  publication 
is  designed  for  the  industry-oriented  individual  whose 
daily  work  involves  intimate  contact  with  science 
and  technology — for  engineers,  chemists,  physicists, 
research  managers,  product-development  managers,  and 
company  executives.  Annual  subscription:  Domestic, 
31.50;  foreign,  32.25*. 
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NONPERIODICALS 

Applied  Mathematics  Series. 

tables,  manuals,  and  studies. 


Mathematical 


Building  Science  Series.  Research  results,  test 
methods,  and  performance  criteria  of  building  ma- 
terials, components,  systems,  and  structures. 

Handbooks.  Recommended  codes  of  engineering 
and  industrial  practice  (including  safety  codes)  de- 
veloped in  cooperation  with  interested  industries, 
professional  organizations,  and  regulatory  bodies. 

Special  Publications.  Proceedings  of  NBS  con- 
ferences, bibliographies,  annual  reports,  wall  charts, 
pamphlets,  etc. 

Monographs.  Major  contributions  to  the  techni- 
cal literature  on  various  subjects  related  to  the 
Bureau's  scientific  and  technical  activities. 

National    Standard    Reference    Data    Series. 

NSRDS  provides  quantitative  data  on  the  physical 
and  chemical  properties  of  materials,  compiled  from 
the  world's  literature  and  critically  evaluated. 

Product  Standards.  Provide  requirements  for 
sizes,  types,  quality  and  methods  for  testing  various 
industrial  products.  These  standards  are  developed 
cooperatively  with  interested  Government  and  in- 
dustry groups  and  provide  the  basis  for  common 
understanding  of  product  characteristics  for  both 
buyers  and  sellers.  Their  use  is  voluntary. 

Technical  Notes.  This  series  consists  of  com- 
munications and  reports  (covering  both  other  agency 
and  NBS-sponsored  work)  of  limited  or  transitory 
interest. 

CLEARINGHOUSE 

The  Clearinghouse  for  Federal  Scientific  and 
Technical  Information,  operated  by  NBS,  supplies 
unclassified  information  related  to  Government- 
generated  science  and  technology  in  defense,  space, 
atomic  energy,  and  other  national  programs.  For 
further  information  on  Clearinghouse  services,  write : 
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